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Abstract. A minimum dominating set for a digraph (directed graph) is 
a smallest set of vertices such that each vertex either belongs to this set or 
has at least one parent vertex in this set. We solve this hard combinatorial 
optimization problem approximately by a local algorithm of generalized 
leaf removal and by a message-passing algorithm of belief propagation. 
These algorithms can construct near-optimal dominating sets or even 
exact minimum dominating sets for random digraphs and also for real- 
world digraph instances. We further develop a core percolation theory 
and a replica-symmetric spin glass theory for this problem. Our algorith¬ 
mic and theoretical results may facilitate applications of dominating sets 
to various network problems involving directed interactions. 
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1 Introduction 

The construction of a minimum dominating set (MDS) for a general digraph 
(directed graph) [112] is a fundamental nondeterministic polynomial-hard (NP- 
hard) combinatorial optimization problem [3] - A digraph D = {V,A} is formed 
by a set V = (1,2,.., N} of N vertices and a set A = {( i,j ) : i, j £ V} of M arcs 
(directed edges), each arc (i,j) pointing from a parent vertex (predecessor) i to 
a child vertex (successor) j. The arc density a is defined simply as a = M/N. 
Each vertex i of digraph D brings a constraint requiring that either i belongs to 
a vertex set T or at least one of its predecessors belongs to T. A dominating set 
r is therefore a vertex set which satisfies all the N vertex constraints, and the 
dominating set problem can be regarded as a special case of the more general 
hitting set problem |4l5| . 

A dominating set containing the smallest number of vertices is a MDS, which 
might not necessarily be unique for a digraph D. As a MDS is a smallest set of 
vertices which has directed edges to all the other vertices of a given digraph, it is 
conceptually and practically important for analyzing, monitoring, and control¬ 
ling many directed interaction processes in complex networked systems, such as 
infectious disease spreading [6], genetic regulation im chemical reaction and 
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metabolic regulation [S], and power generation and transportation m • Previous 
heuristic algorithms on the directed MDS problem all came from the computer 
science/applied mathematics communities [2] and they are based on vertices’ 
local properties such as in- and out-degrees mm- In the present work we 
study the directed MDS problem through statistical mechanical approaches. 

In the next section we introduce a generalized leaf-removal (GLR) process 
to simplify an input digraph D. If GLR reduces the original digraph D into an 
empty one, it then succeeds in constructing an exact MDS. If a core is left be¬ 
hind, we implement a hybrid algorithm combining GLR with an impact-based 
greedy process to search for near-optimal dominating sets (see Fig. [3] and Ta¬ 
ble G}. We also study the GLR-induced core percolation by a mean field theory 
(see Fig. [2]). In Sec. [3] we introduce a spin glass model for the directed MDS 
problem and obtain a belief-propagation decimation (BPD) algorithm based on 
the replica-symmetric mean field theory. By comparing with ensemble-averaged 
theoretical results, we demonstrate that the message-passing BPD algorithm has 
excellent performance on random digraphs and real-world network instances, and 
it outperforms the local hybrid algorithm (Fig. [3] and Table [lj. 

This paper is a continuation of our earlier effort m which studied the undi¬ 
rected MDS problem. Since each undirected edge between two vertices i and j 
can be treated as two opposite-direction arcs ( i , j) and (j, i), the methods of this 
paper are more general and they are applicable to graphs with both directed 
and undirected edges. The algorithmic and theoretical results presented here 
and in m may promote the application of dominating sets to various network 
problems involving directed and undirected interactions. 

In the remainder of this paper, we denote by di + the set of predecessors of 
a vertex i, and refer to the size of this set as the in-degree of i\ similarly di~ 
denotes the set of successors of vertex i and its size defines the out-degree of this 
vertex. With respective to a dominating set T, if vertex i belongs to this set, we 
say i is occupied, otherwise it is unoccupied (empty). If vertex i belongs to the 
dominating set r or at least one of its predecessors belongs to T, then we say i 
is observed, otherwise it is unobserved. 


2 Generalized Leaf Removal and the Hybrid Algorithm 

The leaf-removal process was initially applied in the vertex-cover problem Ha¬ 
lt causes a core percolation phase transition in random undirected or directed 
graphs HS3- Here we consider a generalized leaf-removal process for the directed 
MDS problem. This GLR process iteratively deletes vertices and arcs from an 
input digraph D starting from all the N vertices being unoccupied (and unob¬ 
served) and the dominating set r being empty. The microscopic rules of digraph 
simplification are as follows: 

Rule 1: If an unobserved vertex i has no predecessor in the current digraph 
D, it is added to set r and become occupied (see Fig. HJ4). All the previously 
unobserved successors of i then become observed. 
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Fig. 1. The generalized leaf-removal process. White circles represent unobserved ver¬ 
tices, black circles are occupied vertices, and blue (gray) circles are observed but un¬ 
occupied vertices. Pink (light gray) arrows represent deleted arcs, while black arrows 
are arcs that are still present in the digraph. (A) vertex i has no predecessor, so it 
is occupied. (B) vertex j has only one predecessor k and no successor, so vertex k is 
occupied. (C) vertex l has only a single unobserved successor m, so the arc ( l , m) is 
deleted. 


Rule 2: If an unobserved vertex j has only a single unoccupied predecessor 
(say vertex k) and no unobserved successor in the current digraph D, vertex k 
is added to set T and become occupied (Fig. [1)3). All the previously unobserved 
successors of k (including j) then become observed. 

Rule 3: If an unoccupied but observed vertex l has only a single unobserved 
successor (say to) in the current digraph D , occupying l is not better than 
occupying to, therefore the arc (l, to) is deleted from D (Fig.QJl). We emphasize 
that vertex m is still unobserved after this arc deletion. (Rule 3 is specific to the 
dominating set problem and it is absent in the conventional leaf-removal process 

mm-) 

The above-mentioned microscopic rules only involve the local structure of 
the digraph, they are simple to implement. Following the same line of reasoning 
i n [133: we can prove that if all the vertices are observed after the GLR process, 
the constructed vertex set r must be a MDS for the original digraph D. If some 
vertices remain to be unobserved after the GLR process, this set of remaining 
vertices is unique and is independent of the particular order of the GLR process. 


2.1 Core percolation transition 

We apply GLR on a set of random Erdos-Renyi (ER) digraphs and random reg¬ 
ular (RR) digraphs (see Fig. [2]) and also on a set of real-world directed networks 
(see Table |T|. To generate an ER digraph of size N and arc density a, we first 
select aN different pairs of vertices totally at random from the set of N(N— 1)/2 









4 


Y. Habibulla, J.-H. Zhao, and H.-J. Zhou 




a 


a 


Fig. 2. GLR-induced core percolation transition in Erdos-Renyi (left panel) and reg¬ 
ular random (right panel) digraphs, w is the fraction of occupied vertices, n CO re is 
the fraction of remaining unobserved vertices. Cross symbols are results obtained on 
a single digraph with N = 10 6 vertices and M = aN arcs, lines (left panel) and plus 
symbols connected by lines (right panel) are mean-field theoretical results for N = oo. 


possible pairs, and then create an arc of random direction between each selected 
vertex pair. Similarly, to generate a RR digraph, we first generate an undirected 
RR graph with every vertex having the same integer number (= 2a) of edges 
m, and then randomly specify a direction for each undirected edge. 

If the arc density a of an ER digraph is less than 1.852 and that of a RR 
digraph is less than 2.0, a MDS can be constructed by applying GLR alone. 
However, if a > 1.852 for an ER digraph and a > 2.0 for a RR digraph, GLR 
only constructs a partial dominating set for the digraph, and a fraction n core of 
vertices remain to be unobserved after the termination of GLR. For ER digraphs 
n core increases continuously from zero as a exceeds 1.852. The sub-digraph in¬ 
duced by all these unobserved vertices and all their predecessor vertices is re¬ 
ferred to as the core of digraph D. 

We develop a percolation theory to quantitatively understand the GLR dy¬ 
namics on random digraphs. For theoretical simplicity we consider a GLR pro¬ 
cess carried out in discrete time steps t = 0,1,.... In each time step t, first 
Rule 1 is applied to all the eligible vertices, then Rule 2 is applied to all the 
eligible vertices, then Rule 3 is applied to all the eligible arcs, and finally all 
the newly occupied vertices and their attached arcs are all deleted from digraph 
D. The fraction w of occupied vertices during the whole GLR process and the 
fraction n core of remaining unobserved vertices are quantitatively predicted by 
this mean-field theory (see the Appendix for technical details). These theoretical 
predictions are in complete agreement with simulation results on single digraph 
instances (Fig. [2]). We believe that when there is no core ( n core = 0), the MDS 
relative size w as predicted by our theory is the exact ensemble-averaged result 
for finite-connectivity random digraphs. 
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Table 1. Constructing dominating sets for several real-world network instances con¬ 
taining N vertices and M = aN arcs. For each graph, we list the number of unobserved 
vertices after the GLR process (Core), the size of the dominating set obtained by a 
single running of the greedy algorithm (Greedy), the hybrid algorithm (Hybrid), and 
the BPD algorithm at fixed re-weighting parameter x = 8.0 (BPD). Epinionsl [RT1 
and WikiVote 1171181 are two social networks, Email [19] and WikiTalk 1171181 are two 
communication networks, HepPh and HepTh na are two research citation networks, 
Google and Stanford m are two webpage connection networks, and Gnutella31 [223 is 
a peer-to-peer network. 


Network 

N 

M 

a 

Core 

Greedy 

Hybrid 

BPD 

Epinionsl 

75879 

405740 

5.347 

348 

37172 

37128 

37127 

WikiVote 

7115 

100762 

14.162 

7 

4786 

4784 

4784 

Email 

265214 

364481 

1.374 

0 

203980 

203980 

203980 

WikiTalk 

2394385 

4659565 

1.946 

72 

63617 

63614 

63614 

HepPh 

34546 

420877 

12.183 

982 

9628 

9518 

9512 

HepTh 

27770 

352285 

12.686 

1900 

7302 

7213 

7203 

Google 

875713 

4322051 

4.935 

98473 

315585 

314201 

313986 

Stanford 

281903 

1992636 

7.069 

68947 

90403 

89388 

89466 

Gnutella31 

62586 

147892 

2.363 

26 

12939 

12784 

12784 


2.2 The hybrid algorithm 


The GLR process can not construct a MDS for the whole digraph D if it con¬ 
tains a core. For such a difficult case we combine GLR with a simple greedy 
process to construct a dominating set that is not necessarily a MDS. We define 
the impact of an unoccupied vertex as the number of newly observed vertices 
caused by occupying this vertex mm- For example, an unobserved vertex 
with three unobserved successors has impact 4, while an observed vertex with 
three unobserved successors has impact 3. Our hybrid algorithm has two modes, 
the default mode and the greedy mode. In the default mode, the digraph is it¬ 
eratively simplified by occupying vertices according to the microscopic rules of 
GLR. If there are still unobserved vertices after this process, the algorithm first 
switches to the greedy mode, in which the digraph is simplified by occupying 
a vertex randomly chosen from the subset of highest-impact vertices, and then 
switches back to the default mode. 

The hybrid algorithm can be regarded as an extension of the pure greedy al¬ 
gorithm which always works in the greedy mode. The simulation results obtained 
by the hybrid algorithm and the pure greedy algorithm are shown in Fig. [3] for 
random digraphs and in Table [T| for real-world network instances. The hybrid 
algorithm improves over the greedy algorithm considerably on random digraph 
instances when the arc density a < 10. But when the relative size n core of the 
core in the digraph is close to 1, the hybrid algorithm only slightly outperforms 
the pure greedy algorithm. 
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Fig. 3. Relative sizes w of dominating sets for Erdos-Renyi (left panel) and random 
regular (right panel) digraphs. We compare the mean sizes of 96 dominating sets ob¬ 
tained by the Greedy, the Hybrid, and the BPD algorithm on 96 digraph instances of 
size N = 10 5 and arc density a (fluctuations to the mean are of order 1CP 4 and are 
not shown). The MDS relative sizes predicted by the replica-symmetric theory are also 
shown. The re-weighting parameter is fixed to x = 10.0 for ER digraphs and to x = 8.0 
for RR digraphs. The vertical dashed lines mark the core-percolation transition point 
a « 1.852 for ER digraphs and a = 2.0 for RR digraphs. 


3 Spin Glass Model and Belief-Propagation 


We now introduce a spin glass model for the directed MDS problem and solve it 
by the replica-symmetric mean field theory, which is based on the Bethe-Peierls 
approximation 123123) but can also be derived without any physical assumptions 
through partition function expansion |25l26j . We define a partition function Z(x) 
for a given input digraph D as follows: 


z{x )=e n - a - ci) n (! - ^ 

c i£V j£di+ 


(i) 


The summation in this expression is over all the microscopic configurations c = 
{ci, C2,..., cjv} of the N vertices, with c t € {0,1} being the state of vertex i 
(Ci = 0, empty; Cj = 1, occupied). A configuration c has zero contribution 
to Z(x) if it does not satisfy all the vertex constraints; if it does satisfy all 
these constraints and therefore is equivalent to a dominating set, it contributes 
a statistical weight e ~ xW ^, with W(c) = Yhiev Ci being the total number of 
occupied vertices. When the positive re-weighting parameter x is sufficiently 
large, Z(x) will be overwhelmingly contributed by the MDS configurations. 

We define on each arc (i, j) of digraph D a distribution function qC\j , which 
is the probability of vertex i being in state Ci and vertex j being in state Cj if 
all the other attached arcs of j are deleted and the constraint of j is relaxed, 
and another distribution function <{^l , which is the probability of i being in 
state Ci and j being in state Cj if all the other attached arcs of i are deleted 
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and the constraint of i is relaxed. Assuming all the neighboring vertices of any 
vertex i are mutually independent of each other when the constraint of vertex i is 
relaxed (the Bethe-Peierls approximation), then when this constraint is present, 
the marginal probability q1' of vertex i being in state Cj is estimated by 


Qi = 


[n d 


-« 




n 

jedi+ 


0,0 


n 

k£di~ 


Cfc 


( 2 ) 


where z- L is a normalization constant, and d” is the Kronecker symbol with 
6^ = 1 if to = n and 5^ = 0 if otherwise. Under the same approximation we 
can derive the following Belief-Propagation (BP) equations on each arc 


a,cj 

= 


Zi 


Co ,a 

QjU = 


i-*3 




\ n 


k£di+ Cfc 


n n 

k£di+ l€di~\j Q 


Ci,Ci 


(3a) 


; n 


Cj+a 


kedj+\i Ck 


n n > 

k£dj+\i l£dj~ ci 

(3b) 


where z^j and Zj^-i are also normalization constants, and dj + \i is the vertex 
set obtained after removing i from dj + . We can easily verify that = q^j 
for a = 0 or 1, and that q 

We let Eqs. (121) and © guide our construction of a near-optimal dominating 
set r through a belief propagation decimation algorithm. This BPD algorithm is 
implemented in the same way as the BPD algorithm for undirected graphs m, 
therefore its implementing details are omitted here (the source code is available 
upon request). Roughly speaking, at each iteration step of BPD we first iterate 
Eq. © for several rounds, then we estimate the occupation probabilities for 
all the unoccupied vertices using Eq. ©, and then we occupy those vertices 
whose estimated occupation probabilities are the highest. Such a BPD process 
is repeated on the input digraph until all the vertices are observed. The results 
of this message-passing algorithm are shown in Fig. [3] for random digraphs and 
in Table |T] for real-world networks. 

If we can find a fixed point for the set of BP equations at a given value of the 
re-weighting parameter x, we can then compute the mean fraction w of occupied 
vertices as w = (1/iV) Y^iev ll- The total free energy F = — (1/x) In Z(x) can 
be evaluated as the total vertex contributions subtracting the total arc contri¬ 
butions: 


F = 


i&V 


-£T“ !>-”[ n E 




j£di+ Cj 




Ci,Cn C-i.Ci 

Qir+jQjU 


Ci,Cj 


n 


0,0 


11 'IJ- 
j€di+ 


n ec 

k£di~ 


(4) 


The entropy density s of the system is then estimated through s = x(w — F/N). 
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For a given ensemble of random digraphs, the ensemble-averaged occupation 
fraction w and entropy density s at each fixed value of x can also be obtained from 
Eqs. Q, Q and (J4J) through population dynamics simulation 113] ■ Both w and 
s decrease with x , and s may change to be negative as x exceeds certain critical 
value. The value of w at this critical point of x is then taken as the ensemble- 
averaged MDS relative size wq (very likely it is only a lower bound to wq). For 
example, at arc density a = 5 the entropy density of ER digraphs decreases to 
zero at x ~ 9.9, at which point w « 0.195. These ensemble-averaged results for 
random ER and RR digraphs are also shown in Fig. [3] We notice that the BPD 
results and the replica-symmetric mean field results almost superimpose with 
each other, suggesting that dominating sets obtained by the BPD algorithm are 
extremely close to be optimal. 


4 Conclusion 

In this paper we studied the directed dominating set problem by a core perco¬ 
lation theory and a replica-symmetric mean field theory, and proposed a gen¬ 
eralized leaf-removal local algorithm and a BPD message-passing algorithm to 
construct near-optimal dominating sets for single digraph instances. We expect 
these theoretical and algorithmic results to be useful for many future practical 
applications. 

The spin glass model m was treated in this paper only at the replica- 
symmetric mean field level. It should be interesting to extend the theoretical 
investigations to the level of replica-synrmetry-breaking m for a more complete 
understanding of this spin glass system. The replica-symmetry-breaking mean 
field theory can also lead to other message-passing algorithms that perform even 
better than the BPD algorithm [23] (the review paper [25] offers a demonstration 
of this point for the minimum vertex-cover problem). 
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Appendix: Mean field equations for the GLR process 


The mean field theory for the directed GLR process is a simple extension of 
the same theory presented in m for undirected graphs. Therefore here we only 
list the main equations of this theory but do not give the derivation details. 
We denote by P(k+,k-) the probability that a randomly chosen vertex of a 
digraph has in-degree k + and out-degree k-. Similarly, the in- and out-degree 
joint probabilities of the predecessor vertex i and successor vertex j of a randomly 
chosen arc (i, j) of the digraph are denoted as Q + (k + ,k-) and Q_(fc + ,/c_), 
respectively. We assume that there is no structural correlation in the digraph, 
therefore 


Q+(k+,k-) 


k-P(k+, k-) 
a 


<5_(fc + ,fc_) 


k+P{kj r , k-) 
a 


(5) 


where a = J2k + fc_ h+P(k+, k~) = Y^k + fc_ k-P(k+, k-) is the arc density. 

Consider a randomly chosen arc (*, j) from vertex i to vertex j, suppose ver¬ 
tex i is always unobserved, then we denote by at the probability that vertex 
j becomes an unobserved leaf vertex (i.e., it has no unobserved successor and 
has only a single predecessor) at the i-th GLR evolution step, and by 7 [ 0 , t ] the 
probability that j has been observed at the end of the t-th GLR step. Similarly, 
suppose the successor vertex j of a randomly chosen arc (i,j) is always unob¬ 
served, we denote by /3[o,t] the probability that the predecessor vertex i has been 
occupied at the end of the t-th GLR step, and by rjt the probability that at the 
end of the t-th GLR step vertex i becomes observed but unoccupied and having 
no other unoccupied successors except vertex j. These four set of probabilities 
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are related by the following set of iterative equations: 


a t = S°Q-( 1,0)+ E Q-(k+,k-) 


k .(_, k- 
t -1 


<5* 


(%) fc+ - 1 (7[0,0 ]r--Sl + S° k _ 


t —2 


(! - <*t° - O (E + - (E + ( 7 [ 0 ,t- 2 ]) fc “ 


t'=0 


t'=0 


/3[o,*] = !- 55 Q+ (*+>*-) 


fc_|_, fc_ 


^(l-«2j(l-«o) fc - 1 + 


t -1 


(i-O i- (55 r k , ) + (i-E 




v fc_-l 


t '=0 


t'=0 


'7[o,t] = 1 — E Q-(k+,k-)(l -/3[ 0 ,t]) fc+ ‘(l-E^') . 


/c_|_, fc_ 


i'=0 


Vt 


= s°t E Q + (fc+,fc-)(l-(l-/3[o,o]) fc+ )(7[o,o]) fc -" 1 + 


fc_|_, fc_ 


(6a) 

(6b) 

(6c) 


(1-0 E 0 + (fc+,fc-)[(l-(l-Ao,t]) fc+ )(7[o, t ]) fc - 1 

fc_l_, k— 

- (1 - (1 - /3 [ o,i_i ] ) fc +)(7 [ o, t _i]) fe - _1 | . (6d) 


Let ns define Oicum — o : ficum — bfo.oc] • 7cum — 7[0,oo] '■’iid Tjcum — 
12t>o r l t as the cumulative probabilities over the whole GLR process. From 
Eq. © we can verify that these four cumulative probabilities satisfy the fol¬ 
lowing self-consistent equations: 


&cum — ^ ( Q— {k+i k-)(jj curn ) + (7 cum) ^ (7a) 

fc_)_, k— 

Pcum = 1 - E < ?+( fc +’ fc -)[ 1 _ (.Vcum) k+ ] (1 - Ot cum ) k -~ l , (7b) 

k— 

leum = 1 ^ E <2-( fc +> *-)(! - ^c«m) fe+_1 (l - dc™)^ , (7c) 

fc-|_, fc_ 

Vcum= E <2+( fc -H fc -)[! - (! ^/3c«m) fc+ ] (7c«m) fc__1 • (7d) 

fc-j_ , fe_ 


The fraction n COT . e of vertices that remain to be unobserved at the end of the 
GLR process is 


ricore = E P ( fc +’ fc -) 5 1 - Pcum) k+ - {j] cum ) k +] (1 - a cum ) k ~ 
k .(_, k— 

-- E P(k + ,k-)k + ( 1- ft cum cum) (jj cum) ipicum) (8) 

fc_ 
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The fraction w of vertices that are occupied during the whole GLR process 
is evaluated through 

w= 1- ^2 P ( k +’ k -)[ 1 - (Vcum) k+ ] (1 - Ct cum ) k - 

k .(_, k— 

t -1 t -1 

— p{i,o)t} 0 -J2 E p (*+’ k -) k +‘nt(52 , nt') k+ 1 C52'n i ) k ~ 

£> 1 k .|_, k— t'—0 t'—O 

-EE p ( fc +> E fc__1 1 1 - (! - EA') fc+ ] ■ (9) 

£>1 fe_|_, fc_ t'—O t'=0 



